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COWPUTING MACHINE HAVING IMPROVED COMPUTINQ ARCHITECTURE 
AND RELATED SYSTEM AND METHOD 

Cum OF PRIORITY 

|1J This application claims prtorty to U.S. Provislonai Application Serial 

No, 60/422,603, filed on October 31 , 2002, which is Incorporated by reference. 

Cross refi^ence to related appuoations 

f2| This application is related to U.S. Patent App, Serial Nos. 10/684,102 

entitled IMPROVED COMPUTING ARCHiTECTURE AND REU\TED SYSTEiVI AND 
METHOD; 10/683,929 entitled PIPELINE ACCELERATOR FOR lypROVED 
COMPUTING ARCHiTECTURE AND RELATED SYSTEy AND METHOD; 
10/684,057 entitled PROGRAMyABLE CIRCUIT AND RELATED COMPUTING 
MACHiNE AND METHOD; and 10/683,932 entitled PIPELINE ACCELERATOR 
HAVING MULTIPLE PiPELiNE UNITS AND RELATED COMPUTING MACHINE 
AND METHOD; a8 filed on October 9, 2003, and having a common owner, and 
which are Incorporated by reference, 

BACK<SR0UN£> 

tSj A common computing architecture for processing relatively large 

amounts of data in a relatively siiort period of time includes multiple interconnected 
processors that share the processing burden. By sharing the processing burden, 
these multiple processors can often process the data more quickly than a single 
processor can for a given clock frequency. For example, each of the processors can 
process a respective portion of the data or exeoite a respective portion of a 
processing algorithm, 

(43 FI0. 1 1s a schematic blocK diagram of a conventional computing 

machine 10 having a multi-processor architecture. The machine 10 includes a 
master processor 12 and coprocessorB 14^ ~ 14„, which communicate each 
other and the master processor via a bus f 6, an input port 18 for receiving raw data 
from a remote device (not shown in FIG. 1), and an output port 20 for providing 
processed data to the remote source. The machine 10 also includes a memory 22 
for the master processor 12, respective memories 24i ~ 24« for the coprocessors 14t 
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- f 4«, and a memory 2« that the master processor and coprwessors share via the 
bus i§. The memory Z2 serves as both a program and a working memory for the 
master processor 12, and each memory 24i ~ 24^ serves as both a program and a 
working memory for a respective coprocessor f 4f ~ f 4„. The shared memory 26 
allows the master processor i2 ar^d the coprocessors i4 to transfer data among 
themseives, and from/to the remote device via the ports 18 and 20, respectively. 
The master processor 12 and the coprocessors 14 also receive a common clock 
signal that controls the speed at which machine 10 processes the raw data. 

I5| In general, the computing machine 1Q effectively divides the 

processing of raw data annong the master processor 12 and the coprocessors 14. 
The remote source (not shown in FIG. 1 ) such as a sonar array loads the raw data 
via the port f 8 Into a section of the shared memory 26, which acts as a 
first-in-first-out (FIFO) buffer (not shown) for the raw data. The master processor 12 
retrieves the raw data from tie memory 26 via the bus 16, and then the master 
processor and the coprocessors 14 process the ravs/ data, transferring data among 
themselves as necessary via the bus 1$. The master processor 12 loads the 
processed date into another FIFO buffer (not shown) defined in the shared memory 
20, and the remote source retrieves the processed data fmm this FIFO via the port 
20. 

I6J In an example of operation, the computing machine 10 processes the 

raw data by sequentially perfonning n + 1 respective opera«ons on the raw data, 
where these operations together compos© a processing algorithm such as a Fast 
Fourier Transform <FFT). More specificaHy, the machine 10 forms a data-processing 
pipeline from the master processor 12 and the coprocessors 14. For a given 
frequency of the clock signal, such a pipeline often allows the machine 10 to process 
the raw data faster tlian a machine having only a single processor, 

171 Ater retrieving the raw data from the raw~data FIFO (not shovw) in the 

memory 26, the master processor 12 performs a first operation, such as a 
trigonometric function, on the raw data. This opera^n i^elds a first result, which the 
processor 12 stores In a ftrst-resuif FIFO (not shown) defined within the memory 20, 
Typically, the prDcessor 12 executes a pnogram stored In the memory 22, and 
perfomis the above-described actions under the control of the program. The 
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processor 12 may also us© th© memory Z2 as working rjiemory to temporanly store 
data that the processor generates at mtermedlate intervals of the first operation. 

IB} Next, after retneving the first result from the first-result FIFO (not 

shown) In the memory m, the coprocessor 14-, performs a second operation, such 
a as a loganthmic function, on the first result. This second operation yields a second 
resuif, whidi the coprocessor 14i stores in a second-result FIFO (not shown) defined 
within tie memory 2Q, Typicaliy, the coprocessor 14i executes a program stored in 
the memory 24i, and performs tie above-descrfbed actions under tiie conftx)! of tie 
program. The coprocessor 14f may also use the memory 24i as working memory to 
1 0 temporarily store data that the coprocessor generate at tntennediate intervals of the 
second operation^ 

[93 Then, the coprocessors 242 ~ 24f, sequentially perform third - n^^' 

operations on the second - {n-1)^' results In a manner similar to that discussed 
above for the coprocessor Z4i. 

16 n 0] The operation, which Is performed by the coprocessor 24t,, yields 

the final result, i.e., the processed data. The copn>cessor 24a toads the processed 
data ink) a pn^cessed-data FIFO (not shown) defined within the memory 26, and the 
remote device (not shown in FIO, 1) retrieves the pracessed data from this FIFO. 

£1 1| Because the master processor 12 and coprocessors 14 are 

20 simullaneousiy perfonrjing different operations of the processing algorittim, the 

computing machine 10 Is often able to process the raw data faster than a computing 
machine having a single processor that sequentially performs the different 
operations. Specifically, the single processor cannot retrieve a new set of the raw 
data until It perfomis all n 1 operations on the previous set of raw data. But using 
2S the pipeline technique discussed above, l^ie master processor 12 can retrieve a new 
set of raw 4&tB after perfomfiing only the first operation. Consequ^fly, for a given 
doclc frequency, this pipeline technique can increase the speed at which the 
machine 10 processes the raw data by a fector of approximately n + 1 as compared 
to a single-pnxiessor madilne (not shown in FIG. 1), 

30 [12] Alternatively, the computing machine 10 may process the raw data in 

parallel by simultaneously performing n + 1 instances of a processing algorithm. 
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such as an FFT, on the mw data. That fs, the algorfthm includes n + 1 sequentfal 
operations as described above in the previous ©xample, then each of the master 
processor f 2 and the coprocessors 14 sequentiatiy |>erform all n + 1 operations on 
respective sets of the raw data. Consequently, for a given clock frequency, this 
paraiiel-processing technique, like the above-described pipeline technique, can 
increase the speed at which the machine 10 processes the raw data by a factor of 
approximately n 1 as compared to a singb-processor machine (not shown In FIG. 
1)> 

[13| Unfortunately, although the computing machine 10 can process data 

more quiddy than a single-processor computer machine (not shown In FIG. 1), the 
date-processing speed of the machine 10 is often significantly less than the 
frequency of the processor clock. Specifically, the data-processing speed of the 
computing machine 10 is ilmited by the time that the master processor 12 and 
coprocessors 14 require to process data. For brevity', an example of this speed 
limitation is discussed fn conjunction with the master processor 12, although it is 
understood that this discussion also applies to the coprocessors 14. As discussed 
above, the master processor f2 executes a program that controls the processor to 
manipulate data in a desired manner. This program jncludes a sequence of 
instructions that the processor 12 executes. Unfortunately, the processor 12 
typically requires multiple clocK cycles to execute a single instructton, and often must 
execute multiple instruotions to process a single value of data. For example, 
suppose that the processor 12 is to multiply a first data value A (not shown) by a 
second data value S (not shown). During a first dock cycle, the processor 12 
retrieves a multiply Instruction from the memory 22, During second and third clock 
cycles, the processor 12 respectively retrieves A and B from the memory 20, During 
a fcHJrth clock cyde, the processor 12 rrwltiplles A and 8, and, during a fifth dock 
cycle, stores the resulting product in the memory 22 or 2$ or provides the resulting 
product to the remote device {not shovw). This Is a best-case scenado, because in 
many cases the processor 12 requires additional dock cycles for overhead tasks 
such as initializing and dosing counters. Therefore, at best the processor 12 
requires five clock cycles, or an average of 2.5 ciock cydes per data value, to 
process A and B., 
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CI43 Consequ©n«y, tt^e speed at which the computing machine 10 

processes data is often significantly lower than the frequency of the dock that drives 
the master processor f2 and the coprocessom 14, For example, if the processor 12 
is clocked at 1.0 Gigaherta: (GHz) but requires an average of 2,5 clock cycles per 
6 data vaiue, then the effective data-processing speed equals (1 .0 GHz)/2,5 - 0,4 
GHz. This effective data-processing speed is often characterized in units of 
operations per second. Therefore, in this example, for a clock speed of 1.0 GHz, th© 
processor 12 would be rated with a data-processing speed of 0.4 
Gigaoperations/second (Gops), 

10 C1 5] FIG. 2 is a block diagram of a hartlwired data pipeline 30 that can 

typically process data faster than a processor can for a given dock frequency, and 
often at substantially the same rate at whfch the pipeline Is clocked. The pipeline 30 
includes operafc>r circuits 32i - 32» tfiat each perform a nespsctive operation on 
respective data without exe<3uting program instructions. That Is. the desired 

16 operation Is "burned in" to a circuH 32 such that it implements the operation 

automaticaily. wflthout the need of program instructions. By eliminating the overhead 
associated with executing program instructlorte, the pipeilne 30 can typically perform 
mom operations per second than a processor can for a given clock frequency. 

t1 6j For example, the pipeline 30 can often solve the Ibllov^ng equafion 

20 faster than a processor can for a given clock frequency 

where represents a sequence of raw data values. In this example, the operator 
circuit 32, is a multiplier that calculates 6Xk, the circuit 3^2 is an adder that calculates 
6xj< + 3, ar>d the circuit 32a (n « 3) is a multiplier that calculates {5Xfe * 3)2**'\ 

25 ti 7} During a first clock cycle k==1 , the circuit $2i necelves <iata value Ki and 

multiplies it by 5 to generate 5xi- 

|183 During a second dock cycle k 2, the circuit 322 fBcelves 6X1 from tiie 

circuit 32i and adds 3 to generate Bxi 3. Also, dunng the second dock cyde, the 
circuit 32i generates 6X2. 

^^'^ £1^1 During a thittj dock cycle k ~ 3, the circuit 323 receives 5xi + 3 

from the circuit 322 and multiplies by 2*^ (effectively left shifts BXi + 3 by xi) to 

5 
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generate the first remit (6x1 ^ 3)2"^. Also during the third clock cyde, the clrmit 32f 
generates Bx$ and the dimift 3% generates 6x2 ^ 3. 

1203 The pipeiine 30 continues processing subsequent raw data values Xk in 

this manner until ail the raw data vaiues are proGessad. 

6 [21] Consequeritty, a delay of two clock cycles after receiving a raw data 
vaiue Xi — this delay is often oaHed tlie fatency of the pipeiine 30 — the pipeffne 
generates the resuft (6x1 ^ 3)2^\ and thereafter generates one result — e.g. , (6x2 
Z}2^\ (5X3 ^ 3)2^^, , Sxn ^ 3)2^ ^ each dock cyde. 

[22] Disregarding the latency » the pipeline SO thus has a data-procesalng 

1 0 speed equa! to the dock speed, in comparison, assuming that the master processor 
12 and coprocessors 14 (FIG* 1) have data-processing speeds that are 0.4 times the 
cbdk speed as in the above axampte, the pipeline 39 can process data 2.5 times 
faster than th^ computing machine 10 (F!G. 1) for a given dock speed. 

[23| Stil! refenring to FIG. 2, a designer may choose to impiemant the 

1 5 pipeyne 30 in a programmable logic !C (PUC), such as a fiefd-pnogrammable gate 
array (FPGA). because a PLfC allows more design and madlffcatton ftexibiOty than 
does an appiicalion specifiG !C (ASIC). To configure the hardwired connections 
within a PLIC, the designer meraty sets inlerconnectlon-eonfigu ration registers 
disposed withfn the PLfC to predetermined binary states. The comblnatton of a!l 
20 these binary states is often called ""lirmwafe.*' Typicafly. the designer loads this 

firmware fnto a nonvolatile memory (not shown in FIG* 2) ^iat is coupled to the PUO. 
When one ""tu ms on"* the PLfC, it downloads the firmware from the memory into the 
interconnee*ton-configuration registers. Therefore, to modify the ftinctioning of the 
PLIC, the designer m^ely modifies the firmware and allows the FLIC to downioad 
26 the modified firmware into ttie intenconnedbn-configuration registers. Thfe ability to 
modify the PLIC by merely modifying the firmware is partfoularty useful during »ie 
pratotyping stage and for upgrading the pipeline 30 in the field", 

p4| Unfbrtynately. the hardwired pipeline 50 typicafiy cannot execute all 

algonthms> partteularfy those that entail significant decision making. A processor can 
30 typically execute a decrsion-making instruction {e.g., conditional instructions such as 
If A, then go to B, else go to approximateiy as fast as it can execute an 
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operationa! instructton {e.g., ''A ^ 8") of comparable length. But although the pipefine 
30 may be able to make a mSativefy simple decision i&.g., "A > 8?^, rt typlcaily 
cannot execute a reiatively complex decision (e.g,, If A, then go to B, else go to C"). 
And although one may ba abte to design tfia pipeiine 30 to executa such a compiax 
5 decision, the stea and complexity of the required circultr>' often makes such a design 
impractfcal, partiouiariy where an algorithm includes multiple diHemnt comptex 
decision^^ 

[25| Consequentiy, processors are typlcaily used fn applications that require 

significant decision making, and hardwired pipelines are typtcaily limited to **nymber 
1 0 crunching" applications that entai! Utile or no decision maicing. 

|26] Fy rthermom, as discussed below, it is typically much easier for one to 

design/modify a processor-based computing machine, such as the computing 
machine 10 of Fl<3* 1, than it Is to design/modify a hardwired pipeHne such as the 
pipejlna W of FIG. 2, particularly where the pipeline 30 includes multipie PLiCs, 

1 5 |27| Computing components, such as processors and their peripherals 

(e.g., memory), typicafly include Industry-standard communication interfaces that 
fadiitate ihB interconnection of the components to form a processor-based 
ODrnputing rr^aohine. 

PSI Typically, a standard communication interface indudes two layers: a 

20 physical layer and a sen/lce layer. 

[20] The physical layer includes Itie drcuitry and the conrespondlng circuit 

interconnections that form tiie interface and the operating parameters of this 
circultfy. For example, the physical layer indudes the pins that connect the 
component to a bus, the buffers that latch data received from the pins, and the 
25 drivers that drive data onto the pins. The operating parameters Include tie 

acceptable voltage range of the data signals that the pins receive, the signal timing 
for writing and reading data, and the supported modes of operation (e.g., burst 
mode, page mode). Convantlona! physical layers Include transistoNransistor logic 
(TTL) and RAMBUS, 

30 |30| The service layer indudes the protocol by which a computing 

component transfers data. The protocol defines the format of the data and the 
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manner 'm which the component sends and receives tie formatted data. 
Conventional communication pmtocols Include ffle-transfer protocol (FTP) and 
TCP/IP (ex|>and). 

[SIJ Consequently, bocause manufacturers and others tj^pically design 

computing components having industry-standard <x)mmunication interfaces, one can 
typicafly design the interface of such a component and Intorconnec* It to other 
computing components vwth relatively little effort. This ailows on® to devote most of 
his time to designing the other pcHtlons of the computing machine, and to easily 
modify th& machine by adding or mmovlng components, 

[32] Designing a computing componont that supports an industry-standard 

communication interface allows one to save design time by using an existing 
physiGaS-layer design from a design {(brary. This also insures that he/she can easily 
interface tiie component to off-the-shelf computing components. 

£33i And designing a computing macliine using computing components that 

support a common industry-standard communication Interface allows the designer to 
interconnect the components with little time and effort. Because the components 
support a common Interfece, the designer can interconnect them via a system bus 
with litBe design effort. And because the supported Interface is an industry standa$tl, 
one can easily modify tfie mat^iine. For example, one can add different components 
and peripherals to the machine as the system design evolves, or can easily 
add/design next-generation components as the technology evolves, Furtherrrtore, 
because the components support a common industry-standard seMce layer, one 
can Incorporate Into the computing machine's software an existing software module 
that irTfiplements the corresponding protocol. Therefore, one can interface trie 
components with little effort because the interface design is essentlaily already in 
place, and thus can focus on designing the portions (e.g., software) of the machine 
that cause the machine to perform the desired function(s)- 

C34| But unforUjnately, ther^ are no known industry-s^ndard 

ccwnmunication Interfiaces for components, such as R„ICs, used to ferm hardwired 
pipelines such as the pip^lne 30 of FIG. 2. 
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PSJ Consequently, to dmtgn a pipeline having multiple PLiCs. one typically 

spends a significant amount of time and exerts a significant effort designing and 
debugging the communicatjon interface between the PLfCs "from scratch," 
Typrcaliy, such an ad hoc communication Interface depends on the parameter of 
5 the data bemg transferred between the PLICa, Likewise, to design a pipeiine that 
Interfaces to a processor, one would have to spend a significant amount of time and 
exert a significant effort in designing and debugging the communicaion Interface 
between the pipeline and the processor fmm scratch. 

I36]{ Similariy, to modify such a pipelrne by adding a PLfC to It. one typically 

1 0 spends a significant amount of time and exerts a significant effort designing and 
debugging the communication interfac© between the added PLIC and tha existing 
PLICs. Likewise, to modify a pipeline by adding a processor, or to modify a 
computing machine by adding a pipeiine, one would have to spend a significant 
amount of time and exert a significant effort in designing and debugging the 
15 communicaSon interface between the pipeiine and processor. 

137} Consequently, referring to FIGS, 1 and 2, because of tse difficulties In 

interfacing muttlpie PLICs and in interfacing a processor to a pipeline, one is often 
forced to make s^nificant tradeoffs when designing a computing machine. For 
example, with a processor-based computing machine, one Is forced to trade number- 

20 crunching speed and design/modification flexibility for complex decisbn^making 
ability. Conversely, with a hardwired plpeline-based computing machine, on© is 
forced to trade complex-decision-making ability and design/modification flexibility for 
number-crunching speed, Purthenmore, because of the difficuftjes in Interfacing 
multiple PLICs, it Is often impractfcalforone to design a pipeline-based machine 

25 having more than a few PLICs, As a result, a practlcai pipeline-based machine often 
has limited functionality. And because of the difficulties in interfaoing a processor to 
a PLIC, it would be impractical to interface a processor to more than one PLIC, As a 
result, the benefits obtained by combining a processor and a pipeline would be 
minimal. 

30 IZB} Therefore, a need has arisen for a new computing architecture that 

allows one to combine the decision-making ability of a prooessor-basad machine 
with tile number-crunching speed of a hardwired-plpellne-based machine. 

g 
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1393 in an embodiment of the invention, a computing machine mciudes a 

finst buffer and a processor coupted to the buffer. The processor Is operable to 
©x©cut# an appiication, a first data-transfer object, and a second data-t!^nsf©r ot>|ect, 
pubiish data under the controi of the application, load the published data Into the 
buffer under the c^nbx)! of the first data-transfer object, and retrieve tite published 
data from the buffer under tie oontnol of the second data-tosfer <^|ect 

£403 Acconding to another embodiment of the invention, the processor is 

operable to retrieve data and ioad the reirieved data Into the buffer under the controi 
of the first data-transfer object, unload the data from the buffer under the controi of 
the second data-transfer object, and process the unloaded data under the control of 
the appiication. 

|41J Where the compHiitlng machine Is a peer-vectCH- machine that Includes a 

hardwired pip^ine accelerator coupled to the processor, the buffer and data-transfer 
Objects facilitate the transfer of data ~~ whether unidirectional or bidirectional -~ 
between tiie application and tfse acceleratcH*. 

Brief Description OF THE Drawings 

[42] FIG. 1 1s a block diagram of a computing machine having a 

conventional multl-'processor amhitechjre, 

f43| Fl<3, 2 Is a bloc!< diagram of a conventional hardwired pipeline, 

[443 FIG, 3 is schematic block diagram of a computing machine having a 

peer-vector architecture according to an embodiment of the inven^on> 

i;453 FIG. 4 is a functional block diagmm of the host pnacessor of FIG. 3 

accoiding to an embodiment of the Invention, 

|46J FiG, 5 is a functional block diagram of the data-transfer paths between 

the data-processing application and the pipeline bus of FIG. 4 according to an 
embodiment of the invention, 

{47| FIG, e Is a functional block diagram of the data-transfer paths between 

the accelerator exception manager and the pipeline bus of FIG. 4 according to an 
embodiment of the invention. 

10 
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148} FIG, 7 is a functional Wock diagram of the data-transfer paths between 

th0 acceterator configuraSon manager and the pipeOne bus of FIG. 4 according to an 
embodiment of the Invention. 

Detailed Description 

C49| JFiG. 3 is a schematic block diagram of a computing machine 40, whfdi 

has a pear-v©ctor architecture according to an embodiment of the Invention, In 
addition to a host processor 42, the pser-veetor machine 40 includes a pipeline 
accelerator 44, which performs at least a portion of the data processing, and which 
thus effectively replaces the bank of coprocessors 14 in the computing machine 10 
of BG, 1 . Therefore, the host-processor 42 and the accelerator 44 are "peers" that 
can transfer data vectors back and forth. Because the acceierator 44 does not 
execute program instructions, ittypicaliy petfon-ns mathematically intensive 
operations on data significantly faster than a foani^ of coprocessors can for a givan 
clock frequency. Consequently, by combing the decision-making ability of Uie 
processor 42 and the number-crunching ability of the accelerator 44, the machine 40 
has same abtMtias as, but can oltim process data faster than, a oonventkmal 
computing machine such as the machine 10. Furthermore, as discussed below and 
in prev/iotisly cited U.S. Patent App, Serial No, 10/683,929 entitled PIPELINE 
ACCELERATOR FOR liVIPROVED COMPUTING ARCHITECTURE AND RELATED 
SYSTEiV! AND METHOD, providing the accelerator 44 with the same communication 
Interface as the host processor 42 fadlitates the des^n and modification of the 
machine 40, particularly where the communications Interface is an industry standard. 
And where ihe accelerator 44 indudes multipie components (e.g., PLiCs), providing 
these components with this same communioation interface facilitates the design and 
modification of the acceierator, particuiariy where the communication interface Is an 
industry standarxl. Moreover, the machine 40 may also provide other advantages as 
described below and in the previousiy cited patent applications. 

|S0| Still referring to FIG. 3, In addition to the host processor 42 and the 

pipeline accelerator 44, the peer-vector computing machine 40 indudes a pressor 
memory 40, an Interface memory 48, a bus 50, a finmware memory 55, optional rawf- 
(iat& input ports 54 and m, processed-data output ports 58 and 80, and an optional 
router 6f , 

11 
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ES1| The host processor 42 Inciudes ^ processing unit S2 and a message 

handler $4, and th© processor memory 4$ Includes a pmc©ssing-unit meroory $S 
and a handler memory $8, which respectively sei-ve as both program and working 
memories for ^e processor unit and the message handler. The processor memory 
4$ also includes an accelerator-configuration registry 70 and a 
message-configuration registry 72, which store respective configuration data that 
allow the host processor 42 to configure the functioning of the accelerator 44 and the 
stnictum of ih& messages that the message handier 64 sends and recerves. 

IS2] The pipeline accelerator 44 is dlspo^d on at least one PLIC (not 

shown) and tndudes handwred pipelines 74*- 74„, which process respective data 
wthout executing program instnjcUons. The firmware memory §t stores the 
con%uration firmware for the accelerator 44. If the accelerator 44 Is disposed on 
multiple PLICs, these PLICs and their respective finrrTware memories may be 
disposed on multiple circuit boards, /.e., daughter cards (not shown). The 
accelerator 44 and daughter cards are discussed further In previously cited U.S. 
Patent App. Serial Nos, 10/683,929 entitled PIPELINE ACCELERATOR FOR 
lyPROVED COMPUTING ARCHITECTURE AND RELATED SYSTEM AND 
METHOD and 10/683,932 entitled PIPELINE ACCELERATOR HAVING MULTIPLE 
PIPELiNE UNITS AND RELATED COMPUTING MACHINE AND METHOD, 
Alternatively, the accolerator 44 may be disposed on at least one ASIC, and ^us 
may have internal interoonnectlons that are unconfigumble. In this alternative, the 
machine 40 may omit the firmware memory 52. Furyiermore, although the 
accelerator 44 is shown including multiple pipelines 74, It may include only a single 
pipeline. In addition, a{thou0h not shown, the accelerator 44 may include one or 
more processors such as a digital-signal processor {DSP). 

imj The general operation of the peer-vector machine 40 Is discussed in 

previously cited U.S. Patent App. Serial No. 10/684,102 entitled IMPROVED 
COyPUTiNG ARCHiTECTURE AND RELATED SYSTEM AND METHOD, and the 
fimctiona! topology and operation of the host processor 42 is discussed below in 
conjunction with PIGS. 4 ~ 7. RG. 4 Is a functional block diagram of the host 
processor 42 and the pipeline bus SO of FK3. 3 according to an emfoodimenf of the 
invention. Generally, the processing unit $2 executes one or more software 
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applications, and the message handler $4 executes one or more soflwane otsjects 
that transfer data l?etwe@n the software applicatiO!i{s) and the pipeHtie accelerator 44 
(FIG. 3). Splitting the data-processing, data-transferring, and other functions among 
different applications and objects allows for easier design and modification of the 
host-processor software. Furthermore, although in th& foHowing description a 
sotware appiication is described as performing a particular operation, it is 
understood that In actual operation, the processing unit $2 or message handler M 
executes the software application and performs this operation under the control of 
the application, Lilcewise, although In the foilowing description a software ob|ect Is 
descnhed as performing a parfiojlar operation, ft is understood that in ac&jal 
operation, the pracessing unit 62 or message handler 04 executes the software 
object and performs this operation under the control of the ob|©ct. 

[S43 still referring to FiG, 4, the processing unit $2 executes a 

data-processing application SO, an accelerator exception manager application 
(hereinafter the exception manager) 82, and an accelerator configy ration manager 
appllcatfon (hereinafter the configuration manager) $4, which are coilecliveiy referred 
to as the processing-unit applications. The data-processing application processes 
data in cooperation with the pipeline accelerator 44 (FIG, 3). For example, the data- 
processing appjication 80 may receive raw sonar data via til© port 54 (FIO, 3), parse 
the data, and send the parsed data to the accelerator 44, and the acceierator may 
perfonm m FFT on the parsed data and return the processed data to the data- 
processing appJIcaSon for further processing. The exceptfon manager 82 handles 
exception messages from the accelerator 44, and the configuration manager 84 
loads the accelerator's conjuration firmware into Itie memory $2 during Initialization 
of the peer^vector machine 40 (FIG. 3). The (Xinfiguratton manager 84 may also 
reconfigure the accelerator 44 after initialization in response to, e.g., a malfunction of 
the aocelerator. As discussed ftirther below in conjunction with FIGS, 6 ™ 7, the 
processing-unit applications may communicate with each other directiy as indicated 
by the dashed lines 85, 87, and 89, or may communicate wflh each other via the 
data-transfer objects 80, The message handier $4 executes the data-transfer 
oblects 86, a communteation object 88, and Input and output read objects 90 and $2, 
and may exeajte input and outpm queue objects 94 and m. The data-transfer 
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objects tmnsfer data between the communication objec* 88 and «ie 
processing-unit appllcattons, and may use the Interface memory 48 as a data buffer 
to aBowth© processmg-unit appiications and the acceterator 44 to operate 
independently. For example, the memory 48 allows the accelerator 44, which is 
often faster than the data-processing apprication 80, to operate without Waiting" for 
the data-processing appljcation. The communication object 8S transfers data 
between th© data objects 86 and the pipeline bus SO. The input and output read 
objects 90 and 92 control the data-transfer objects 8$ as they transfer data between 
the communication object 88 and the pnocessing-unft applications. And. when 
executed, the input and output queue objects 94 and 98 cause the input and output 
read objects 00 and 02 to synchronize this transfer of data according to a desired 
priority 

[553 Furttienrjore, during Inrtfaffzation of the peer-vector machine 40 (Fl<3, 

3), the message handler 04 instantiates and executes a conventional object facjtory 
08, which Instantiates the data-transfer objects 86 from configuration data stored In 
the message-configuration registry T2 (FtG. 3). The message handler 84 also 
Instantiates the communication object 88, the Input and output reader objects 90 and 
92, and the Input and output queue objects 94 and SSfmm the conigura^'on data 
stored in the message-configuration registry 72. Consequently, one can design and 
modify these software objects, and thus their data-transfer parameters, by merely 
designing or modifying the configuration data stored in the registry 72. This is 
typically less time consuming than designing or modifying each software object 
individually, 

IB^ The operatbn of the host processor 42 of FIG. 4 is discussed below in 

conjunction vwth FIGS. S - 7. 

Data Proc esslnj^ 

f57J FIG. 5 Is a ^nctionai block diagram of the data-processing application 

80, the data-transfier objects 80, and the interface memory 4a of FIG. 4 according to 
an embodiment of the invention. 

[58] The data-processing application 80 includes a number of threads lOOi 

- lOOn, which each perform a respective data-processing operation. For example, 
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ih^ ihre&<i iOOj may perfbim an addition, and the thread iOOz may perfomi a 
subtraction, or botli the threads iOOi and f may perforrrt an addltbn. 

[S9] Each threaci 100 generates, , pubHshes, data destined for the 

pipeline accetemfor 44 (FiG« 3), receives, /.e., subscribes to, data from th© 
accelerator, or both fKibltshes and subscribes to date. For example, each of tine 
threads IOOi - IOO4 both publish and subsc^b© to data from the accelerator 44. A 
thread 100 may aiso conrwriynicate directly with another thread 100. For example, as 
indicated by the dashed fine i02, the threads 100$ arKi I0O4 may dlredSy 
communlcaite with eac^ other. Furthem)or©, a thread 100 may receive data from or 
send data to a a>mponent (not shown) o^er than the accelerator 44 3). But 
for brevity, discussion of data transfer between the threads 100 and suoh another 
component Is omitted. 

£60| Still referring to Fl<3. 5, the interface memory 40 and the data-transfer 

objects 86u " • S6„iE, functionatly form a number of unidirectior^ai channels 104i ~~ 104„ 
for transferring data between the respective threads 1G& and the communication 
object 88, The interface memory 48 includes a number of buffers 1O0i - f O0«, on© 
buffer per channel 104. The buffem f OS may each hold a single grouping (e.g., byte, 
word, block) of data, or at least some of the buffers may be FIFO buffers that can 
each store respective multiple groupings of data. There are also two data objects 80 
per channel f 04, one for fransfening data between a respective thread 100 and a 
respective buffer 106, and #ie other for transfem'ng data between the buffer i06 and 
the communication object 88. For example, the channel 104f includes a buffer lOBi, 
a data-transfer object Sejafortransfemng published data from ttie thread 100^ to the 
buffer 106i, and a data-transfer object 86-0, for transferring the published data from 
the buffer i06i to the communication object 88. including a respective channel 104 
for each allowable data transfer reduces the potential for data bottfenecks and also 
factlitates the design and modification of the host processor 42 {FIO, 4). 

im} Refen^lng to FIGS. 3-5, the operatkjn of the host processor 42 during 

Its initialization and v^Hhlie executing the data-processing application 80, the 
data-transfer (Ejects 80, the communicafton object 88, and the optbnai reader and 
queue objects $0, 02, S4, and BO is discussed according to an embodiment of the 
invention, 
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1623 During IsiHIalizatfon of the host processor 42, the object factory 9B 

instantiates th© data-transfer objects B6 and defines the buffers 104, SpectficaUy. 
the object factoiy 95 downioads the configuratton data from the registry 72 and 
generates the software code for each data-transfer object BGxn that the 
data-pmcessing appHcation SO may need. The identity of the data-transfar objects 
$0xb that the application 80 may need Is typically part of the configuration data — the 
application BO, however, need not use all of the data-transfer objects 80, Then, from 
the generated objects 86xt„ the object factory 98 respectively instantiates the data 
objects B6xa- Typically, as discussed In the exampi© befow, the object factory B8 
instantiates data-transfer objects 8$xs and 8Sxi> thai access the same buffer 104 as 
multiple instances of the same software code. This reduces the amcwnt of code that 
the obj^t factory 98 would otherwise generate by approximately one half, 
Furthentiore, the message handler €4 may determine which, If any, date-transfer 
objects 80 the appiicaSon 80 does not need, and delete the tnstences of ttiese 
unneeded data-transfer objects to save memory. Mematively, the message handler 
04 may make this determinafen before the object factory 98 generates the 
data-transfer objects 80, and cause the object factory to instantiate oniy the 
data-transfer object that the applicafen 80 needs. In addition, because the 
data-transfer objects 06 include the addresses of the Interface memory 48 where the 
respective buffers 104 are located, the object factory 98 ©tTectiveiy defines the sizes 
and locations of the buffers when it Instantiates the data-transfer objects. 

163] For example, the object factory 98 instantiates the data-transfer 

objects 80iA and 86ib in the following manner. Finst, the factory 98 downloads the 
configuration data from the r^istry 72 and generates the common software code for 
the data-transfer object B0ia and 801^. Next, the factory 98 instantiates the 
data4ransfer objects 80i„ and BOn, as respective Instances of the common software 
code. That is. the message handler $4 effectively copies the common software code 
to two locates of the handler memory $8 or to otiier program memory (not shown), 
and executes one location as the object BO^a and the other location as the object 
O0itt' 
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[643 Still referring to FIOS. 3-5, after initialfeatJon of the host processor 42, 

the data-pnocessmg application B0 processes data and sends data to and recaives 
data from the pipeline accelerator 44. 

IQS} An example of the data-processing application 80 sending data to the 

accelerator 44 is disctissed in conjunction with the channel f 04?. 

[663 Firat, the thread 1O0i generates and publishes data to the data-transfer 

object 86f«. The thread lOOi may generate data by operating on mw data that it 
receives from the accelerator 44 (further discussed below) or from another source 
(not shown) such as a sonar array or a data base via the port 54. 

[671 Then, tha data-object SSfa loads the pubiished data into the buffer 

[68] Next, the data-transfer object BB-sb detennines that th© buffer 106i has 

been loaded with newly published data from the data-transfer object 86^. The 
output reader object $2 may periodically instruct the data-transfer object S0f& to 
check the buffer f 06t for newly published data. Aitematively, the output reader 
object 92 notifies the data-transfer object 86m when the buffer f06i has received 
newly published data. Speciicaify, the output queue object 96 generates and stores 
a unique Identifier (not shown) in response to the data-transfer object $6}^ storing the 
published data in the buffer 1O0i. In response to this ider^tifier, the output reader 
object 92 notifies the data-transfer object 8$it that the buffer 106i contains newly 
published data. Where multiple buffers 10$ contain respective nafl^y published data, 
then the output queue object 9$ may record the order in which this data was 
published, and the output reader obiect 92 may notify fh& respective data-transfer 
objects BSxb In the same order. Thus, the output reader object 92 and the output 
queue object 90 synchronize the data transfer by causing the first data published to 
be the first data that the respec^e data-transfer object $6^ sends to the accelerator 
44, the second data published to be the second data that the respective data- 
transfer object 80xif sends to itie accelerator, etc. In another alternative where 
muitipie buffers 10$ contain respective newly published data, the output reader and 
output queue objects 92 and 9$ may impfement a priodty scheme otfier ^an, or in 
addition to, this first-in-flrst-out scheme. For exampie, suppose the thread lOdi 
publishes first data, and subsequently the thread iOOz publishes second data but 
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also publishes to ttie output queue object 06 a priority flag associated with the 
seconcJ data. Because ths second data has prioiity over the first data, the output 
reader ob|ect 92 notifies the data-transfer object SSzt of the published second data in 
the buffer 10$^ belbra notifying the data4ransfer object $6^ of the pybiished first 
5 data in the buflBf 108f. 

t69| Then, the data-transfer object 80ft mtrieves the published data from: 

the buffer 106f and fcnmats the data in a predetermined manner. For example, the 
object 86ii, generates a message that indudes tue published data (Lb., the payload) 
and a header that, e.gr,, identifies the destination of ttie data within the accelerator 
10 44. This message may have an industry-standard fomiat such as the Raprd iO 

(Input/output) format* Because the generation of such a message is conventiona!, It 
IS not discussed further. 

|70j Mar the data^transfer object B§it formats the published data, it sends 

the formatted data to the communication object 8B. 

1 5 [71] Next the communication object 88 sends ifie formatted data to the 

pipeline accelerator 44 via the bus SO. The communication object 8$ is designed to 
implement the communication protocol (e.g.. Rapid lO, TCP/IP) used to transfer data 
between the host processor 42 and the accel^ator 44. For example, the 
communication obiect 88 imptements the required hand shaking and other transfer 

20 parameters (e.g., arbitrating the sending and receiving of messages on the bus 50) 
that the protocol requires. Alternatively, the data-transfer object S&jcb can rmplement 
the communication protocol and the communicatbn object 8S can be omitted. 
However, this latter alternative Is less efRoiant because it requires all the data- 
transfer obieets 8Sxb^ inoiuiiB additlonai code and functionality. 

25 172] The pipeNna accelerator 44 then receives the fermatted data, recovers 

the data from the message (e<^., separates the data from the header if there is a 
header), directs the data to the proper destfna^on within ttie acceieratcH-, and 
processes the data. 

[73| StilS referring to FIGS. 3-5, an exampie of the pipeline accelerafor 44 

30 (FiG. 3) sending data to the host processor 42 (FIG. 3) is discussed in Gonjunction 
with the channel 1042* 
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[743 P^rst the pipeiSne acceterator 44 generates and formats data. For 

example, the accefemtor 44 generates a massage that indudes the data pavioad 
and a header that, e.g., identifies the destination threads 100^ and f 00^, which are 
the threads that are to moeive and process the data> As discussed above, this 
5 message may have an industry-standard format such as the Rapid 10 (input/output) 
format. 

[753 Naxt, Sie accelerator 44 drives the formatted data onto the bm 50 in a 

conventional manner. 

[70] Then, the communicatbn obiect 88 reoeivea the formatted data from 

1 0 the bus SO and provides th^ formatted data to the data-transfer obfect 8$2b^ In one 
embodiment the formatted data is in the fbnn of a message, and the communication 
object 88 analyzes the massage header (which, as dfecussed above, Identifies the 
destination threads WOi and IQO2) and provides the message to the data4ransfer 
object 86zt in response to the header. In another embodiment, the communication 
16 ol;^ect 88 provides the message to all of the data4ransfer oiyeots SSru, each of which 
anaJyzes the message header and processes the message only if Its function is to 
provide data to the destinafion threads lOOi and 100%. Consequenay, \n this 
exampie, only the data-transfer object 88$^ processes the massage. 

[771 Next, the data-transfer obfect BB^t toads the data received from ttie 

20 communication object 88 into the buffer For exampte, if the data Is contained 
within a message payioad, the data-transfer object 862^ recovers the data from the 
message {B.g., by stripping the header) and loads the recovered data into the 
buffer tO% 

frS] Then, the data-transfer object determines that the buffer 100^ has 

25 received new data from the data-transfer object SOi^. The Input reader object 90 
may penodfcaHy Instruct the data^ransfer object 86^ to check the buffer 10$^ for 
newly received data. Alternatively* the input reader object SO notHles tiie 
data-transfer object 88m when the buffer 108^ has received newfy published data. 
Speclficaily, the input queue object 94 generates and stores a unique identifier (not 
30 shown) in response to the data-transfer object se^h aioring the published data In the 
buffer W$2^ in response to this identifier, the Input reader object 90 notifies the 
data4ransfer object 86^,^ that the buffer 108^ contains newiy published data. As 
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discussed above in conjunction with the output reader and output queue objects 92 
and 00, where multiple buffers f 06 contain nsspectiv© ne\A/ly published data, then the 
input queue objecst 94 may record the order in whfeh this data was published, and the 
input mader ofo|eot 90 may notify the respective data4ransfer objects SS^a m the 
same order, Alternativefy, where multiple buffers 106 contain n^spective newly 
pubilshed data, the input reader and input queue objects 90 and 94 may implement a 
priority scheme other than, or in addition to, this first-in-lirst-oyt sohame- 

[791 Next, the dataobfect SB^b transfers the data from the buffer IO62 to the 

subscriber threads 10Oi arKj 100^, which perfom) respective operations on the data, 

po] Referring to FIG. 5, an exampfe of one thread receiving and processing 

data from another thread is discussed in conjunction with the thread I064 receiving 
and processing data published by the thread 100^. 

£81 1 in one enfibodiment, the thread 100^ publishes the data dfrec«y to the 

thread IOO4 via the optional connectton (dashed line) 102. 

|82J In anofiier embodiment, the thread im^ pubfishes the data to the 

thread IOO4 vfa the channels 104$ and 104^. Specifically, the data-transfer object 

Soads the pubHshed data into the buffer f{?%. Next, the data4ransler object B6m> 
retrieves the data from the buffer 106^ and transfers the data to the communication 
obfect BS, which publishes the data to the data-transfer obfect 86m* Then, the 
data4ransfer object 86m leads the data into the buffer 10B$. Hmd, the data4ransfer 
object 86e^ transfers the data from the buffer 106^ to the thread i004. Atternativeiy, 
because the data is not being transferred via the bus SO, than one may modify the 
data-transfer object SSm such that It loads the directly into the buffer 106^, thus 
bypassing the communication object 88 and the data-transfer object B6^^. But 
modifying the data-transfer object 88^$ to be different from the other data-transfer 
objects 86 may Increase the comple>dJy modularity of the message handler 64, 

|83| Stif! r^feiTlng to PIG* 5, additional data-transfer techniques are 

contemplated. For example a single thread may pubiish data to multiple locations 
within the pipeline accelerator 44 (JFiQ. 3) via respective multipte channels. 
Afternalivery, as discussed in prevtously cited U,S. Patent App, Serla! Nos, 
10/684,102 entitled IMPROVED COyPUTIHG ARCHiTECTURE AHD RELATED 
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SYSTEM AND METHOD and 10/803,929 entHfed PIPELINE ACCELERATOR FOR 
ly PROVED coy PUTING ARCHITECTURE AND RELATED SYSTEM AND 
METHOD, ihe acceterator 44 may receive data via a single channel 104 and provide 
it to multiple iocatfons wiitiin the accaSerator. Furthermore, muitSpfe threads {e.g., 
5 threads lOOi and IOO2} may subscribe to data from the same channe! (e.gf., ehanne! 
f 042). In addition, multiple threads {e.g., threads lOOz and fOO^) may publish data to 
the same location within the acceferator 44 via the same channel (ag>, ohanne! 
f 04-j), af though the threads may publish data to the aame aoceierator jooation via 
respective chanoeis f (?4> 

1 0 [84| FIG* 8 13 a ftjnctionai block diagram of the exception manager $Z the 

data-transfer objects B0, and the intarfacje memory 4$ acconding to an embodiment 
of the ifivantlon. 

IB5} The exception manager 82 receives and logs exceptions that may 

occur during the initialization or operation of the pipeline acceSerator 44 (FIG. 3). 

15 Generaily, an exception Is a designer^defined event where the aoceierator 44 acts in 
an undeslred manner. For example, a buffer (not shown) that overflaws may be an 
exception, and thus cause the acoeterator 44 to generate an exception message and 
send St to the exception manager SZ Generation of an exception message is 
discussed in previously cited U.S. Patent App. Seria! No. 10/683,829 enfilied 

20 PIPELINE ACCELERATOR FOR IMPROVED COM PUTIHG ARCHITECTURE AND 
RELATED SYSTEM AND METHOD. 

|86J The exception manager 82 may also handle exceptions that occur 

during the initialization or operation of the pipeilne accelarator 44 (FIG. 3). For 
example. If the acceleratDr 44 Indudes a buffer (not shown) that overflows, then the 

25 exception manager 02 may cause the acceteralor to Increase the size of the buffer to 
prevent future overflow. Or, If a sectbn of the accelerator 44 maifcnctions, the 
exception manager 82 may cause another aectlon of the acceierator or the 
data-processing application 80 to perform the operation that the malfunctioning 
seol^lon was intended to perform. Such exception handling m further discussed 

30 below and in prevlousiy cited U.S. Patent App. Serial No. 10/;683,929 entltlad 

PIPELINE ACCELERATOR FOR iMPROVED COMPUTING ARCHITECTURE AND 
RELATED SYSTEM AND METHOD. 
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£873 To log and/or handle acceienator excepttons, the excfej^n manager B2 

subscribes to data from one or more subscriber threads 100 (FIG. 5) and determines 
from this data whether an exceptbn has occurred. 

[881 !n one alternative, the exception manager 82 subscribes to the same 

S data as the subscriber threads f 00 (FIG. §) subscribe to. Specifically, the manager 
82 receives this data via the same respective channeis 104^ (which include, e.g., 
channel 1042 of FIG. S) from vvh!<;^ the subscriber threads 100 (which include, e.g., 
threads lOOi and iOOs of FiQ. 5) receive the data. Consequently, the channels 104s 
provide this data to the exception manager 82 in the same manner that they provide 
0 this data to the subscriber threads 100, 

£893 In another alternative, the exception manager 82 subscribes to data 

from dedicated channels 106 (not shown), which may receive data from sections of 
the acceierator 44 (FIG. 3) that do not provide data to the threads 100 via the 
subscriber channels 104^. Where such dedicated channeis f 04 are used, the object 
factory 9$ (FIG. 4) generates the data-transfer objects 8$ for these channeis during 
initialization of the host processor 42 as discussed above In cof^Junction with PIG. 4. 
The exception manager 82 may subscribe to the dedicated channels 100 exclusively 
or in addition to the subscriber channels f <?4«. 

IBQ} To detemiine whether an exception has o<xurred, tie exception 

manager 82 compares tiie data to exception codes stored in a registry (not shown) 
witiiln the memory 00 {FIG. 3). If the data matches one of the codes, then the 
exception manager 82 determines that the exception corresponding to the matched 
code has occurred, 

[913 In another alternative, the exception manager 82 analyzes the data to 

determine if an exception has occurred. For exampie, the data may represent the ■ 
result of an operation performed by the accelerator 44. The excep^on manager 82 
detemnines whether the data contains an en^r, and, if so, detennines tiat an 
exception has occun-ed and the kJentlty of the exception. 

I!92J After determining that an exception has occurred, the exception 

manager 82 logs, e.g., the corresponding exception code and the time of 
occurrence, for later use such as during a debug of tiie acceierator 44. The 



22 



wo 2WM/042S74 



exception manager B2 may also determine and convey the identity of the exertion 
to, 0.fif., the system dssianer, in a oonventlona! manner. 

|93J Altemativery. in addition to logging the exception, the exception 

nrianager B2 may implement an appropriate procedure for handling the exception. 
For ejiampie, ihe exception manager $2 may handle tie exception by sending an 
exception-handiing Instruction to the accelerator 44, ihe data-processing application 
SO, or the configuration manager 84. The exception manager 82 may $end ttie 
exceptibn-handling insfexiction to the accelerator 44 either via the same respecdBve 
channels i04p {e.g., channel i04i of FIG. 5) through which the publisher threads 100 
(e.g., thread 10Oi of FIO. 5) publish data, or through dedicated exception-handling 
channels 104 (not shown) that operate as described above In conjunction wlflu FIG. 
5. if the exception manager B2 sends instaictions via ot^er channels 104, then the 
object factory 98 (FiG, 4) generates the data-transfer objects 86 for these channels 
during Inttiailzation of the host processor 42 as described above in conju notion with 
FIG. 4, The exception manager S2 may publish exception-handling instmctlons to 
the data-processing application 80 and to the configuration manager 84 either 
directly (as indicated by the dashed lines 85 and 89 in FIG. 4) or via the 
ctisnnels fO^d^.^? and ?0443^2 (application 80) and channels f04ca»f and lOdctnz 
(configuration manager 84), which the object factory 98 also generates during the 
Initialla^tion of the tiost processor 42. 

[94] Still refierring to FJO, 6. as discussed below the exception-handling 

institiciions may cause the accelerator 44, data-processing application 80, or 
configura^on manager 84 to handie the cormapondlng exception in a variety of 
ways. 

[953 When sent to the acceierator 44, the exception-handling Instruction 

may change the soft configuration or tfie functioning of the accelerator. For 
example, as discajssed above, if the excep^on is a buffer overflow, the instruction 
may change the accelerator's soft configuration (/.a, by changing the contents of a 
soft configuration register) to Increase the size of the buffer. Or, if a secUon of Vn& 
accelenator 44 that perfonns a particular operation is malfunctioning, the instniction 
may change the accelerator's functioning by causing the accelerator to tal^e the 
disabled section "off line." In this latter case, the exception manager 82 may, via 
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addlfe'onal instructions, cause another section of the accelerator 44, or the 
data-processing application 80, to "take over" the operation from the disabled 
accelerator section as discussed below. Altering the soft configuration of the 
accelerator 44 is further discussed in prevlousiy cited U.S. Patent App, Serial No. 
5 1 0/683,929 entitled PiPELiNE ACCELERATOR FOR IMPf^OVED COMPUTiNG 
ARCHITECTURE AND RELATED SYSTEM AND METHOD (Attorney Docket No. 
1934-13-3). 

IBB} When sent to the data-processing appijcation 80, the 

exception-handling instructions may cause are data-processing application to "take 

1 0 over" tiie operation of a disabled sectbn of the accelerator 44 »iat has been taken 
off line. Altfiough the processing unit 62 <FI0. 3) may perfonn this operation more 
slowly and less efficiently than the accelerator 44, this may be preferable to not 
performInQ the operation at alL This ability to shift the perfonnance of an operation 
from the accelerator 44 to the processing unit 62 increases the flexibitity, reilabiilty. 

1 5 maintainability, and fault-tolerance of the peer-vector machine 40 (Fl<3, 3). 

I97J And when sent to the configuration manager 84, the 

exception-handling instnictlon may cause the configuration manager to change 8ie 
hartj configuration of the accelerator 44 so that the accelerator can continue to 
perform the operation of a malfunctioning section that has be&n taken offline. For 

20 example, If the accelerator 44 has an umse6 section, then the osnfiguration 

manager 84 may configure this unused section to perform the operation that was to 
be the malfunc^oning section. If the accelerator 44 has no unused section, then the 
configuration manager 84 may reconfigure a section of ih& accelerator that currently 
perfonns a first operation to perlbmfi a second operation of, /.e„ take over for, the 

26 malfunctioning section. This technique may be usefui whare the first operation can 
be omitted but the second operation cannot, or where the data-processing 
application 80 is more suited to perfonn the first operation than it is the second 
operation. This ability to shift the performance of an operation from one section of 
the accelerator 44 to another section of the accelerator increases the fiexibilfty, 

30 reliability, maintainability, and fault-tolerance of the peer-vector machine 40 (FIG. 3). 

j[98| Referring to Fl©. 7, tie configuration manager 84 toads the firmware 

that defines the hard configuration of the accelerator 44 during Initialization of the 
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p©er~vec5lor machme 40 {FIG. 3), and, as discussed at>ove in conjunction v^tti FIG. 6, 
may toad firmware that redefines the hard asnfisuration of the acceiemtor in 
response to an excep^on according to an embodiment of the Invention. As 
discussed below, the configuration manager 84 often reduces the compiexfty of 
designing and modifying the accelerator 44 and Increases the fault-tolerance, 
reltabliity, majntainabiiity, and flexibility of the peer-vector machine 40 (FIG. 3). 

During ir^ltialization of the peer-vector machine 40, the conflguratton 
manager 84 receives configuration data linom itie accelenator conjuration regls^ 
70, and loads ccwifiguration firmware identified by ihe configuraaon data. The 
configuration data are effectively instructions to the conflgurafen rrianager 84 tbr 
loading the firmware. For example, ff a section of the inltiallzied accelerator 44 
performs an FFT, then one designs the configuration data so that the fIrmwarB 
loaded by the manager 84 Implements an Fl=nr in this section of the accelerator. . 
Consequently, one can modify the hand configuration of tlie accelerator 44 by merely 
generating or modifyjiig the configuration data before initializalion of the peer-vector 
machine 40. Because generating and modifying the configuration data is often 
easier than generating and modifying the firmware directly — particularly if the 
configuration data can instruct the configuration manager 84 to load existing 
firmware from a library — the configuration manager $4 t^ically reduces the 
complexity of designing and modifying the accelerator 44, 

11 00} Before the configuration manager 84 loads the firmware Identified by 

the configuration data, the configuration manager detemilnes whether the 
acceterator 44 can support the configuration defined by the configuration data. For 
ej^mpte, If the configuration data instructs the configuration manager $4 to load 
firmware for a particular PLiC (not shown) of the accelerator 44, then the 
configuration manager 54 confirms that the PLiC is present before ioading the clata. 
!f tfie PLIC is not present, then the configuration manager 84 halts the initialization of 
the accelerator 44 and notiffos an operator that the accelerator does not support the 
configuration. 

{1 01} After the configuration manager 84 confirms that the accelerator 

supports the defined configuration, the configuration manager loads the firmware into 
the accelerator 44, which sets Its hard configuration with the firmware, &,g„ by 
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loading the fjnrjware Into tiNie fitmwar© memory 52. Typically, the configuration 
manager 84 s^nds the firmware to the accetemtor 44 via one or more channels f 04* 
that am similar In generation, structure, and operation to the channels f04 of FIG. 5. 
The configumtion manager 84 may also receive data from the accelerator 44 via one 
or more channeis i04u- For exampie, the acxelerator 44 may send confirmation of 
the successful setting of its hard configuration to the configuration manager $4, 

CIO23 After n\0 hard conf^uratlon of the accelerator 44 Is set, the 

configuration manager S4 may set tfie accelerator's hard configuration in response to 
an exception-handling Insteuciion from the excepfion manager 84 as discussed 
above In conjunction with FIG, 6, In response to the exception-handling Instoictlon, 
the configuration manager 84 downloads the appropriate configuration data from the 
registry 70, loads reconfiguration firmware identified by the con^guratbn data, and 
sends the firmware to the accelerator 44 via the channels f a4f. The configuration 
manager 84 may receive confiimation of successful reconfiguration from the 
accelerator 44 via the channels f <?4„, As discussed above in conjunction with FIG. 
6, the configuration manager 84 may receive the exception-handling instruction 
directly from the exception manager 82 via the line 89 (FIG. 4) or Indirectly via the 
channels 104c^i and 104cmS' 

[i 033 The configuration manager 84 may also reconfigure the 

dafa-processing application 80 In response to an exceptbn-handling Instruction from 
the exception manager 84 as discussed above In conjunction with FIG. 6. in 
response to the exception-handling instruction, me configuration manager 84 
Instructs the data-processing application 80 to recor^figure itself to perform an 
Operation tiat, due to malfunction or other reason, the accelerator 44 cannot 
perform. The configuration manager 84 may so instruct the data-processing 
application SO directly via the line 87 (FIG. 4) or Indirectly via channels W4api and 
i04(ips, and may receive information from the data-processing appiication, such as 
confirmatfon of successful reconfiguration, dtrectiy or via another channel 104 {not 
shown). Alternatively, the exception manager 82 may send an exceptlon-handiing 
instruction to the data-processing 80, which reconfigures Itself, thus bypassing ^le 
configurafen manager 82. 
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[104| StH! referring to FIO. 7, aftemat© ©miKJdHnents of the configuration 

manager 82 are contempteted. For example, the configuratton manager 82 may 
mconfigure the accelerator 44 or itie data-processing appHcation 80 for reasons 
other *han ttie occurrence of an accelerator maifunctlon, 

5 [1Q51 The preceding discussion is presented to enable a person skilled In the 

art to make and use the invention. Various modifications to the embodiments wBl be 
readily apparent to those skilled in the art, and the generic principles herein may be 
applied to other embodiments and applications vsnttiout departing from the spirit and 
scc^ae of the present iwention. Thus, iBie present Invention Is not intended to be 
1 0 limited to the embodimente shown, but is to be accxjrded the widest sa>pe consistent 
with the principles and features disclosed herein. 
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WHAT !S CLAIMED IS: 

1 , A oomputing machine, comprising: 

a first buffer; 

a proo©$sor coypted to the buffer and operable to, 

execute an apH^licatlon, a first data-transfer object, and a second data- 
transfer object 

publish data under the control of the app^ica^on, 

load the puNishad data into the buffer under the control of the first 
data-transfer object and 

retrieve tha published data from the buffer untSm the control of the 
second data-transfer ctojeci> 

2, The computing machine c?f claim 1 wherein the first and second data- 
transfer objects respectively comprise first and second instances of the same object 
code. 

3* The computing machine of claim 1 wherein the processor comprises: 
a processing unit operable to execute the application and publish the data 
under the contro! of the application; and 

a data-transfer handler operable to execute the first and second data-transfer 
objects, to toad the published data into the buffer under the control of the first data- 
transfer abject, and to retrieve tie published data under the contro! of the second 
data4mnsfer object. 

4> The computing machine of claim 1 wherein the processor is furaier 
operable to execute a thrmii of the application and to publish the data under the 
contro! of the itiread. 

5. The computing machine of claim 1 wherein the processor is further 
operable to: 

execute a queue ob|act and a reader objeGt; 
store a qyeue value under the contro! of the queue object the queue 
value raflacting the foading of the pubyshad data into the buffer; 

read the queue value under the control of the reader obfect; 
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notify the second software object that the publfshed data occupies ttie 
buffer under the control of the reader object and in reapofiss to the queue vaf ue: and 

retries tha pubilshed data from the storage location under the control 
of the secontd data-transfer object and in response to the notlioaion, 

6* The oomputing madhlne of claim 1 , further comprising: 
3 bus; and 

wherein the processor m operable to execute an communication object and to 
drive the retrieved data onto the bus under the control of the communication object. 

7. The oxnputing machine of claim 1 , furihBr comprisinQ: 
a second buffer; and 

wherein the processor is operable to provide the retrieved data to the second 
buffer under the oontroi of the second data-transfer object 

8. The computing machine of claim 1 wherein the processor Is further 
operable to generate a message that fnducfes a header and the retrieved data undBr 
the control of the second data-traiigfer object 

9. The computing machine of claim 1 wherein: 

the first and aacond data4ransfer oblects respectively asmprlse first and 
second instances of the same object code; and 

the processor Is operable to execute an ofcyoct faotory and to generate the 
object code under the oontroi of the object factory. 

10> A oomputing machine, comprising: 
a first buffer; 

a processor coupled to the buffer and operable to, 

execute first and second data-transfer objects and an application, 

retrieve data and load the r^rleved data into the buffer under the 
control of the ilrst data-transfer object, 

unload the data from the bufer under the oontro} of the second data- 
transfer object, and 

process the unloaded data und^r the contnDi of the application. 
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1 1 . The computing machine of claim 1 0 wherein the first and second data- 
transfer objects respecHvely comprise first and second jftstances of the same object 
code. 

1 2. The oomputing machine of clatm 1 0 wherein the processor comprises: 
a processing unit operabie to execute the appftcatloo and process the 

unloaded data under tie control of the application; and 

a dat3>transfer handler operable to execute the first and seoarKl data-fransfer 
objects, to r^eve the date from the bus and load the data into the buffer under the 
control of the first data-transfer object, and to unload the data from the bijffer under 
the control of the second data-fransfer object. 

13. The computing machine of dalm 10 wherein the processor is ftjrther 
operabie to execute a thread of the application and to process the unloaded data 
under the TOntrol of the thread, 

14. The computing madiine of claim 10 wherein Sie processor is further 
operable to: 

execute a queue object and a reader object; 

store a queue vaiue under the control of flie queue object, the queue 
value reflecting the toading of the published data Into the first buffer, 

r^d the queue value under the control of the reader object; 

noWy the second data-transfer object that the pubiished data occupies 
the buffer under tlie controi of the reader object and in response to tfie queue value: 
and 

unload the pubiished data from the buffer under the control of ^e 
second data-transfer object and in response to the notification. 

15. The computing machine of claim 10, further comprising: 
a second buffer; and 

wherein the processor is operable to retrieve the data from the second buffer 
under the control of the finst data-transfer object, 

1 8, The computing machine of claim 1 0, further comprising: 
a bus; and 
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wherein the processor is op©fa!^e to execute an communication object, to 
receive the data fnom the bus under the control of the communication oi^lect. and to 
retrieve the data from th& communication object under the control of me first data- 
tmnsfer object 

1 7, The computing machine of claim 1 0 wherein: 

tie first and second data-transfer objects resp^:4lvely comprise first and 
second Instances of the seme object code; and 

the processor is operabJe to execute an objed: factory and to generate the 
object code under the contof of the c*tject factory. 

18, Tbe oomputhig machine of claim 10 wherein the processor is ftirther 
operable to recover the data from a message that includes a header and the data 
under the control of ih& first data-transfer object. 

19, A peer-vector mach«ie, comprising: 
a buffer; 

a bys; 

a processor coupied to the buffer and to the bus and operable to, 

execute an application, first and second data-transfer objects, and an 
communication object, 

publish data under the control of the application, 
oad the published data into the bufer under the control of the first data- 
transfer object, 

retrieve the pubJished data from the buffer under the control of the 
second data-transfer object, and 

dnve the published data onto the bus under the control of the 
communication object; and 

a pipeline accelerator coupled to the bus and operable to receive the 
published data from the bus and to process the received published data. 

20. The peer-vector machine of claim 1 9 wherein: 
the processor Is further operable to construct a message that includes the 
published data under the control of the second data-transfer objecf and to drive the 
message onto the bus under the control of the communication object; and 
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the pipeline aoceterator operabia to receive tha message from the bus and 
to recover the published data from the message, 

21 . The peer^vector machine of daim 19, further corr^prising: 
a registry coupled to the host processor and operabte to store object data; and 
wherein the processor is operable to» 
execute an object factory , and 

to generate me first and second data-transfer objects and the 
communication object from the object data under the control of the object factory. 

22* A peer-^vector machSne. comprismg: 
a buffer; 

a bus; 

a pipeline accelerator coupled to the bus and operabie to generate data and 
to drive the data onto the bus; and 

a processor coupled to the buffer and to the bus and operable to, 

execute an application, first and second data-transfer c*)|ects, and an 
communication object, 

receive the data from the bus under the control of the communication 

objecl, 

f oad Sie received data into ttie buffer under the controi of the first data- 
transfer object, 

unload the data from the buffer under the control of the second data- 
transfer object at^d 

process the unfoaded data under Hie control of the appilcation. 

23. The peer-vector machine of cla^m 22 wherein: 

the pipefine aocelarator is further operabfe to oonstmct a message that 
indudes the data and to drive the message onto the bus; and 

the pn^fcessor m operabia to, 

receive the message from the bus under the control of the 
communication object, and 

recover the data from the message under the control of the first data- 
transfer objet*. 

24. The peer-vector machine of claim 22, fmthm compnsing: 

32 



wo 2W4/042S74 



a registry coupled to the host processor and operable to store object data; and 

wherein the processor is operable to, 

execute an object factory, and 

to generate the first and second data-transfer objeots and the 
commuoicatioo object from the object data under the controi of the object fectory, 

25* A peer-vector macWne, comprising: 
a first buffer; 
a bus; 

a pmceasor coupled to the buffer and to the bus and operable to, 

execute a configuration manager, first and second data-transfer 
objects, and a communication object, 

load configuration firmware into the buffer under the oontroi of the 

configuration manager and the first data-transfer object, 

retrieve the configuration firmware from the buffer under the controi of 
the second data4ransfar object, and 

drive the conf igu ration firmware onto the bus under the control of the 
Qommunication object; and 

a pipeline acceieralor coupjad to the bus and operable to receive the 
configuration firmware and to configure itself with the configyration firmware, 

26, The pe^-vector machine of claim 26 wherein : 

ttie processor is futltser operable to constiiict a message that includes the 
configuration firmware under the controi of the second data-transfer object and to 
drive the massage onto the bus under the controi of the communication object; and 

the pipeOno aocalaralor is operable to receive the message from the bus and 
to recover the configuration firmware from the message, 

27. The peer-vector machine of daim 26, further compdsing: 

a registry coupled to tiie processor and operable to store configuratton data; 

and 

wherein the processor is operable to locate the configuration firmware from 
the configuration data under lie controi of the configurafion manager. 

28< The peer-vector machine of ciaim 25, further oomprislng: 
a second buffer; and 
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wherein the processor is c^erabfe to; 

axeojte an application and third and fourth <Jata4ransf©r objects, 

generate a configuration instaiction ynder the control of the 
configuration managerj 

load the configuration instruotlon into the second buffer under the 
control of the third data-transfer object, 

retneve the oonfSguratton instryction from the second buffer under the 
control of the fourth data4raosfer object, and 

configure the appHcatton to perform m operation corresponding to the 
oonftguratfon instryctioo under ttie control of the appHcatlon. 

29. The peer-vector machine of claim 25 wherein the processor is operable 

to: 

generate a oonfiQuraaon instruction under the control of the eonfiguratton 
managen and 

configure the application to perform an operation corresponding to the 
configuration mstructlon under the controi of the appiJcation. 

30. The peer-vector machine of claim 26 wherein the conHguration 
manager is operable to confirm that the pipeline accelerator supports a configuration 
defined by the configuration data before loadir^g the firmiware. 

31 . A peer*vector machine, comprising: 
a first buffer; 

a bus; 

a pipeline accelerator coupled to the bus and operable to generate exception 
data and to drive the exception data onto the bus; and 

a processor ooupfed to the buffer and to the bus and operabie to, 

execute an exception manager, first and second data-transfer objects, 
and an oommunlcatioo object, 

receive the exception data from the bus under the control of tie 
communication object. 

load the received exception data into the buffer under the control of the 
first data-transfer object, 
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unbad the exception data from the buff^^ under the control of the 
second data-transfer object and 

process tfie unioaded exception data under the control of the exception 

manager. 

32; The peer-vector machine of claim 31 wherein: 

the pipeline is further operafoie to construct a message that Indudee Qie 

exception da«a and to drive tie message onto the bus; and 

the processor is operabte to receive the n^ssage fmm the bus under the 

controi of the communication object and to recover the exception data from the 

message under the centred of the first data-transfer object 

33, The peer-vector machine of daim 31 , further comprising: 
a second buffer; 

wherein the processor Is further operable to, 

execute a configuration manager and third and fourth data-transfer 

objects, 

generate conffgyration firmwam undm the control of the configuration 
manager in response to the exception data, 

load tihe configuratlcm firmware Into the second bi#er under the oontroj 
of ttie third data-transfer object 

unload the configuration instruction from the second buffer under the 
controi of the fourth data-transfer object, and 

drive the configuration fimiwai^ onto the bus under the contrx?! of thB 
communication object; and 

Wherein the pfpelin© acceienator is operable to receive the configuratron 
firmware fmm the bus and reconfigure itsalf with the firmware. 

34. The peer-vector machine of ciaim 31 wherein the processor i$ further 
operabie to: 

execute an aji^lication and a configuration manager; 

genemte a configuration Instruction under the control of tfie configuration 
manager in response to aie excepto data; and 

reconfigure the application under the contro! of the appilcation in response to 
the configuration instruction. 
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35, A peer-vector machine, comprising: 

a configuration registry operable to store configuration data; 

a processor coupled to the configuration registry and operabte to locate 
configuration firmware Irom the configuration data; and 

a pipeline acoaterator coupled to the procassor and operable to configure 
itsalf with the configy ration firmware. 

36* A peer^^vector machine, comprising; 

a configuration registry opei^ble to store configuratton data; 

a pipeline acoelaraiDr; and 

a processor coupled to the configuration registry and to the plpeilne 
accelerator and operable to retrieve configuration irmware in response to the 
configuration data and to ocMiflgure the pipeline accelerator with the configuration 
firmware- 

37. A method, comprising: 
publishing data ^A^th an application; 

loading the published data into a first buffer with a first data-transfer ob|ect; 

and 

retrieving the published data from the buffer with a second data-transfer 

object- 

38> Tlie method of claim 37 wherein pubBshing the data comprises 
publishing the data with a thread of the application. 

39. The method of claim 37, furihar comprising: 

generating a queue value that corresponds to the presence of the pubiished 
data In the buffer; 

notifying the second data-transfer object that the published data oGouples the 
buffer In response to the queue value; and 

wherein retrieving the published data comprises mtrieving the published data 
from the storage location with the second data^ransfer object in response to the 
notification. 

40. The method of cSaim 37, further comprising driving the retrieved data 
onto a bus with a communication object. 
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41 . The method of claim 37, further ocroprising loading Sie mtnaved data 
into a second buffer with the secontd data-transfer object. 

42. The method of oiaim 37, further oomprteing: 

generating a header for the retrieved data with the second data-transfer 
object; and 

combining the header and the retneved data into a message with the second 
data-transfer object. 

43. The method of claim 37, further co^rjprising: 
generating data-transfer object code mih an object factory; 

generating the first data-transfer ob|ect as a first instance of the object code; 

and 

generating the second data4ransfar object as a second instance of the Db|ect 

code< 

44. The metliod of claim 37, further comprising receiving and processing 
the data from the second data4ransfer object with a pipeline accelerator. 

45. A method, comprising: 

retrieving data and loading the retrieved data into a first buffer with a first data- 
transfer ob|ect, 

unloading the data from the bulfer with a second dafea-transfer ob|ect; and 
processjng the unloaded data with an application^ 

46. The method of dalm 45 wherein processing the unloaded data 
comprises processing the unloaded data with a thread of the appiicaiion. 

47. The method of claim 45, further comprising: 

generating a queue value that corresponds to the presence of the data 

in the buffer; 

notifying the sacx>nd data-transfer object that the data occupies the 
buffer in response to the queue value; and 

wherein unbading the data comprises unloading the data from the 
buffer wiQi the first data4ransfer object In msponse to the notification. 

48. The method of claim 45 wherein retrieving the data comprises 
retrieving Itie data from a second buffer vMi the first data-transfer object. 

37 



wo 2W4/042S74 



49. The method of claim 45, further comprising: 

feceMng the datet from a bus with an Gommunication obfect; aiKl 
wherein retrieving the data comprises retrieving the data from the 
GomfnyniGation ofcyedt undar with the first data-transfer object. 

50. The method of dalm 46, further compdslng providing the data to the 
first data-transfar object with a pipaline accelerator. 

51 . A mathod, comprising: 

pubfishing data with an appiication running on a processon 
loading the published data Mo a buffer with a first data^transfar objeci running 
on the proce^^or; 

retrieving the published data from the buffer with a second data-transfer object 
running on the processor; 

driving the retrieved pubiished data onto a bus with an communication object 
running on the processor; and 

receiving the pyfoOshed data from the bus and pro<^ssing the pubiished data 
with a pipeline acceleraton 

62* The method of claim 51 , further compri3ing; 

gmierating a message that indudes a header and the published data with the 
second data-transfer object; 

wherein driving the data onto the bus comprises driving the message onto the 
bus with the communication object; and 

receiving and processing the published data comprises receiving the message 
and recovering the pubNshed data from the message with the pipeline acce!erator> 

53, A method, comprising: 

generating data and driving the data onto a bus with a pipefine accelerator; 
receiving the data from the bus with the communication object; 
iloading the received data into a buffer under with a first data-transfer object; 
unloading the data from ^e buffer with a second data-transfer object; and 
processing the unioaded data with m appHcation. 

54. The method of dalm 53, further comprising: 
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wherein generating the data comprises constrycting a message that indudes 
a hBB^BT and the data with the pipeiine acceierator; 

wherein driving the data comprises driving the message onto the bus wih the 
pipailne acoeleritor; 

wherein recervlng the data eomprtsas recelvHig the message from the bus 
with the communication object; and 

reoovenng the data from the message with the first data-transfer object. 

56- A metiiod, oomprising: 

retrieving o^nfiguratfon finmvare with a configuration manager; 

loading the configuration firmware into a first buffer with a first oommunication 

object; 

retrieving the configuration firmware from the buffer with a second 
communicatton object; 

driving the configuration firmware onto a bus with an communication object; 
receMng the conffguratlon firmware with a pipaifna accaiarator; and 
configuring the pipeline acceterator with the configuration firmware. 

56. The method of daim 55, further comprising: 

generating a configuration Snstruction with the configuration manager; and 
configuring the apr^icaticm to perform an operation oorrespondtng to the 
configuration instniction. 

57. The method of cialm 55, further comprfeJng: 

generating a Donflguratlon instruotton with the configuratbn manager; 

loading the configuratton instruction into a second buffer with a thM 
communication object; 

retrieving the conffguratlon iostructjon from the second buffer with a fourth 
communication ofcjjeet; and 

configuring the application to perfomi an operation conBspondIng to the 
configuration instructionv 

58. A method, comprising: 

generating exception data and driving the exception data onto a bus with a 
pipeline accelemton 

receiving the exception data from the bys with a communication object; 
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badlng the received exception data tnto a buffer with a fif«t data-transfer 

object^ 

unloading the exception data from the buffer with a second data-transfer 
object; and 

processing the unloaded exception data under with an exception manager 
69. The method of daim 58, liirther comprising: 

retrieving configuration firmware with a configuration manager in response to 
the exception data, 

loading the conltgumtion flrmv>«ire into a second buffer with a third transf^ 

ohject; 

unloading the configuration instruction from the second buffer wtthi a fmifh 
data-^nsfer oljject; 

driving the configuraticm firmware onto the bus with the communication object; 

and 

reconfiguring the pipeline accelerator with the configuration firmware, 

60, The method of claim 58, further comprising: 

generating a configuration instruction wltn a configurafion manager In 
response to the error data- and 

reconfiguring the application in response to ihe configuration instruction. 

61 . A mettiod, comprising; 

retrieving configuration fimiwaro pointed to by configuration data stored in a 
configuration registry during an in itialtzation of a computing machine; and 

configuring a pipeline accelerator of the computing machine with tlie 
configuration firmware. 
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